Developer CD Series 1996 November: Technology Seed

home *** CD-ROM | disk | FTP | other *** search

/ Developer CD Series 1996 November: Technology Seed / Mac Tech Seed Nov '96 / Mac Tech Seed Nov '96.toast / mac / QuickTime MPEG a6 / YCrCb Codec Dev Doc < prev

Wrap

Text File | 1996-11-01 | 9.9 KB | 193 lines | [ttro/ttxt]

YCrCb Codec Info. The MPEG decoder engine produces planar Y'CrCb data in a single block of data that is passed on to the ICM for display. Some basic properties of this block of data are specified in the ImageDescription created for the picture (or sequence of pictures) and this data can be accessed from ICM in the usual fashion. Such information includes the width and height of the Y'CrCb data, and the length of this block of data representing the image. The block of data itself is in the form of a YCrCbVideoFrame struct, defined as struct YCrCbVideoFrame { //These are publicly defined fields. UInt32 headerSize; UInt16 luminanceOffset, cbOffset, crOffset; UInt16 luminanceRowBytes, chromaRowBytes; //Following, here, is the actual Y, cR and cB data. }; typedef struct YCrCbVideoFrame YCrCbVideoFrame; #define kYCrCb420Type 'mpyc' The headerSize is right now set to sizeof(YCrCbVideoFrame)==14 bytes. Conceivably that could change in the future. The more interesting information is the luminanceOffset, cbOffset, and crOffset fields. The luminance, cB and cR data is stored as three planes of data, specified by a ptr in memory and a rowbytes, just like a pixmap. The height of the luminance data is given by the height passed on to ICM. The height of the cR and cB data is half that. (Note that the height passed on to ICM will be, per the MPEG specs, a multiple of 16, ie a multiple of a macroblock height. Likewise the width will be as multiple of 16.) One might imagine that the rowBytes information is redundant, that it would simply equal the width of the picture. This is not true for two reasons. The first is that the widths of rows are padded by some number of bytes to stride different rows starting at different offsets in a cache line and thus making the MPEG engine faster. The second is that the luminance and chroma data are interleaved with each other (in a fashion that will not be described further because it may change). So in one's routine for actually displaying this data one will use code that looks something like pascal ComponentResult YCrCbDisplayFrame(SomeStruct* myVariousStuffStruct, YCrCbVideoFrame* framePtr) { UInt8* luminancePtr; UInt8* cRPtr; UInt8* cBPtr; luminancePtr=((UInt8*)ycrcbVideoFrame) +ycrcbVideoFrame->luminanceOffset; cRPtr =((UInt8*)ycrcbVideoFrame) +ycrcbVideoFrame->crOffset; cBPtr =((UInt8*)ycrcbVideoFrame) +ycrcbVideoFrame->cbOffset; /* Now run over these arrays, with loops that look like for(h=0; h<myVariousStuffStruct->srcHeight; h+=2){ for(w=0; w<myVariousStuffStruct->srcWidth; w+=2){ cr =*cRPtr++; cb =*cBPtr++; y00=luminancePtr[0]; y01=luminancePtr[1]; y10=(luminancePtr+ framePtr->luminanceRowBytes)[0]; y11=(luminancePtr+ framePtr->luminanceRowBytes)[1]; luminancePtr+=2; //Do some stuff with y, cr, cb to get them to the screen. } cRPtr+= ( framePtr->chromaRowBytes - myVariousStuffStruct->srcWidth/2 ); cBPtr+= ( framePtr->chromaRowBytes - myVariousStuffStruct->srcWidth/2 ); luminancePtr+=( framePtr->luminanceRowBytes*2 - myVariousStuffStruct->srcWidth ); } } Now a few notes. The type of this codec is 'mpyc'. I have stuck to the exact terminology of Y'CrCb, although in colloquial speech one tends to use YUV as easier to pronounce. Y'CrCb is precisely define in CCIR recommendation 601 (now renamed ITU-R BT 601). If you are unfamiliar with details of this, I would recommend reading details of it. The easiest way to find such details is to use any web search engine with the keywords CCIR 601 and read two or three of the documents that result. Many of these documents (for example the ColorSpaces FAQ) describe in great detail the points I touch on below. Here are some points to note about Y'CrCb. Most of these are subtle points that can be ignored in one's first pass at attempting to get things to work. Once things work acceptably, one may want to return to these points to ensure that onscreen display is not merely acceptable but as good as possible. • WHAT ABOUT GAMMA CORRECTION? The Y' component is gamma corrected. This will affect how and where you place gamma correction in your card design, and if your card is designed for both PCs and Macs you may want to have it toggle between two different modes to get the gamma correction correct on both systems. • WHAT ABOUT DIGITAL VIDEO PINNING? Y' has a nominal range of 16 through 235. 16 and values below are pure black, 235 and values above are pure white. Values in between increase linearly in brightness. The colorspace conversion may choose to ignore this nicety and simply map 0 to black, 235 to white, in the usual fashion. However this can lead to regions of black looking noticably noisy because random values between 01 and 16 that appear in those regions, and that are all supposed to map to black, instead map to noticably different blacks. Likewise the chroma components have nominal values between 16 and 240, however ignoring those is pretty harmless in terms of visual damage. • WHAT ABOUT ALTERNATIVE COLOR SAMPLING, EG 4:4:4? Right now we only deal with MPEG-1 and thus we only deal with 4:2:0 format (ie chroma is subsampled by 2 in both the horizontal and vertical directions). At some point we'll add support for MPEG2 at which point we'll have to deal with other chroma subsampling formats like 4:2:2 and 4:4:4. This will be handled by defining additional codec types for those formats (though presumably your code for building those codecs will share the same code base as your code for this 4:2:0 codec). • WHAT ABOUT SCALING? Your codec MUST deal with scaling. If it does not, it is essentially useless. Pretty much all MPEG video is encoded with an aspect ration that is not 1, eg the video is encoded at 352x240 samples but it is supposed to be displayed in a 320x240 window on a monitor with square pixels. If your codec cannot cope with scaling, it will not be used and the software YCrCb codec will be used. What is obviously optimal is if your card can do scaling in the card. If that is not possible, what might be best is for your card to scale the data as it is reading it in from the YCrCbVideoFrame* data pointer and writing it out to your card's buffers. In addition it is not ideal to hardcode scaling at one value, say 352x240 scaled to 320x240. That is the scaling that is used for NTSC video, but footage derived from PAL video will use different scaling again. Footage from film, or generated by computer might use different scaling again. While scaling is pretty much non-negotiable, your codec has the option of not bothering with clipping, or with some bit depths. Just as with any other codec, if you cannot deal with clipping, or with a 4-bit screen depth or whatever, you simply make this known in your PreDecompress() call and ICM will cover for you. Unlike scaling, clipping and screen depths other than those you are built to support (presumably 24bit, 16bit and maybe 8bit color or 8bit grey) are not common cases. • WHAT ABOUT SOURCE EXTRACTION? Consider a source image (compressed as YCrCb) that one wishes to display. QuickTime has always had a facility for specifying a source rectangle within that source image that is smaller than the entire image. Now practically every MPEG stream has visual gunk of one sort or another around the edges, two or three pixels worth. The MPEG system allows the user to specify a "masking out" of that visual junk by defining a source rectangle from the images that omits this junk. What this means for you is that your hardware/codec must be able to handle source extraction. If you simply ignore it and rely on ICM to do the work for you, as with scaling, you will be useless for practically all MPEG playback. • WHAT ABOUT GREY-SCALE INPUT? The YCrCb input is almost always "24bit" input in the sense that the Y, cR and cB values are 8bit values. However that input may also be, in a sense, 8-bit grey input if some preference has been set asking the MPEG decoder not to decode color. In this case image description describing the input data will still say that the data is 24bit, but the cbOffset, crOffset, and chromaRowBytes fields of the YCrCbVideoFrame will be set to 0. Note that this is a property of the input format and is orthogonal to the screen display format. You may have color input (valid Y', cR and cB data) coming in for display to an 8-bit grey screen (although in that case you would simply ignore the chroma data and display only Y'). Alternatively you could have grey scale data coming in (Y valid but chroma offsets and rowBytes set to 0) for display on a color 16bit screen. (This latter case might occur if the user has decided they want smoother motion at the expense of color and so tell the MPEG decode engine not to bother decoding color.) • WHAT ARE THE DETAILS OF THE COLOR-CONVERSION MATRICES? The color conversion matrix used is R'=Y^ + 1.40200*Cr^ G'=Y^- 0.34414*Cb^- 0.71414*Cr^ B'=Y^+ 1.77200*Cb^ where Cr^ = scale(Cr-128) ie the value from the cR array (which is in the range 0, 255) with 128 subtracted from it and scaled to the [16, 240] range, ie Cr^= [256/(240-16)]*(cr-128). likewise Cb^=[256/(240-16)]*(cr-128). Y^=[256/(235-16)](Y'-16). An alternative formulation, with the scaling directly in the matrix is R'=1.16(Y'-16) + 1.594 (cr-128) G'=1.16(Y'-16) - 0.392 (cb-128) - 0.813 (cb-128) B'=1.16(Y'-16) + 2.017 (cb-128) If you read different books you will see slightly different versions of the above. Quite how accurate one wishes to be depends on the CPU or transistor budget one throws at the problem. Details of exactly how the scaling is done are reasonably flexible within limits -- as long as the clamping of black Y' at 16 is respected one can ignore a few of the finicky details of the above if ignoring is convenient. One is welcome to use hardware/software that is designed for JPEG (which uses slightly different matrices) and the result may be acceptable to some users---but not to others.